Project Description¶

Lego is a well-known brand worldwide, famous for its wide range of toys, popular movies, and successful video games. In this project, we will explore an important moment in Lego’s history: the introduction of licensed sets, such as Star Wars, Super Heroes, and Harry Potter.

The launch of Lego’s first licensed series, Star Wars, was very successful and led to many more collaborations with other popular themes. We will imagine that the partnerships team has asked us to analyze this success. Before we begin the analysis, we will review the descriptions of the two datasets we will use, shown below.

The Data¶

lego_sets.csv¶

Column Description
set_num A unique code for each Lego set. This is very important because a missing value means the set is a duplicate or invalid.
name The name of the Lego set.
year The year the set was released.
num_parts The number of pieces in the set. This is not critical for our analysis, so missing values are okay.
theme_name The name of the sub-theme the set belongs to.
parent_theme The name of the main theme the set belongs to. This matches the name column in the parent_themes dataset.

parent_themes.csv¶

Column Description
id A unique code for each parent theme.
name The name of the parent theme.
is_licensed A Boolean value showing if the theme is licensed or not.

The Rebrickable dataset contains information about every Lego set ever sold, including set names and the bricks they contain. Although Lego bricks are small, this is a large and rich dataset. In this project, we will use this data along with the pandas library to explore the history of Lego’s licensed sets. We will also calculate what percentage of all licensed sets are themed around Star Wars.

This project will help us understand how licensed partnerships shaped Lego’s product line an contributed to its success.

What percentage of all licensed sets ever released were Star Wars themed?¶

In [2]:
# Import pandas and read in the DataFrame
import pandas as pd
lego_sets = pd.read_csv('lego_sets.csv')
lego_sets.head()
Out[2]:
set_num name year num_parts theme_name parent_theme
0 00-1 Weetabix Castle 1970 471.0 Castle Legoland
1 0011-2 Town Mini-Figures 1978 NaN Supplemental Town
2 0011-3 Castle 2 for 1 Bonus Offer 1987 NaN Lion Knights Castle
3 0012-1 Space Mini-Figures 1979 12.0 Supplemental Space
4 0013-1 Space Mini-Figures 1979 12.0 Supplemental Space
In [3]:
# Drop relevant missing rows
lego_sets_clean = lego_sets.dropna(subset=['set_num', 'name', 'theme_name'])
lego_sets_clean.head()
Out[3]:
set_num name year num_parts theme_name parent_theme
0 00-1 Weetabix Castle 1970 471.0 Castle Legoland
1 0011-2 Town Mini-Figures 1978 NaN Supplemental Town
2 0011-3 Castle 2 for 1 Bonus Offer 1987 NaN Lion Knights Castle
3 0012-1 Space Mini-Figures 1979 12.0 Supplemental Space
4 0013-1 Space Mini-Figures 1979 12.0 Supplemental Space
In [4]:
# Get list of licensed sets
parent_themes = pd.read_csv('parent_themes.csv')
licensed_themes = parent_themes[parent_themes['is_licensed']]['name']
licensed_themes.head()
Out[4]:
7                    Star Wars
12                Harry Potter
16    Pirates of the Caribbean
17               Indiana Jones
18                        Cars
Name: name, dtype: object
In [5]:
# Subset for licensed sets
licensed = lego_sets_clean['parent_theme'].isin(licensed_themes)
licensed_sets = lego_sets_clean[licensed]
licensed_sets.head()
Out[5]:
set_num name year num_parts theme_name parent_theme
44 10018-1 Darth Maul 2001 1868.0 Star Wars Star Wars
45 10019-1 Rebel Blockade Runner - UCS 2001 NaN Star Wars Episode 4/5/6 Star Wars
54 10026-1 Naboo Starfighter - UCS 2002 NaN Star Wars Episode 1 Star Wars
57 10030-1 Imperial Star Destroyer - UCS 2002 3115.0 Star Wars Episode 4/5/6 Star Wars
95 10075-1 Spider-Man Action Pack 2002 25.0 Spider-Man Super Heroes
In [6]:
# Calculate the percentage of licensed sets that are Star Wars themed
all_sets = len(licensed_sets)
star_wars_sets = len(licensed_sets[licensed_sets['parent_theme'] == 'Star Wars'])
ratio = star_wars_sets / all_sets
the_force = int(ratio * 100)
print(f'The percentage of licensed sets that are Star Wars themed is {the_force}%.')
The percentage of licensed sets that are Star Wars themed is 51%.

In which year was the highest number of Star Wars sets released?¶

In [7]:
# Create a pivot table of sets released by theme per year
licensed_pivot = licensed_sets.pivot_table(index='year', columns='parent_theme', values='set_num', aggfunc='count')

# Find the year when the most Star Wars sets were released
licensed_pivot.sort_values(by="Star Wars", ascending=False)["Star Wars"]
new_era = 2016
print(f'The year when the most Star Wars sets were released was {new_era}.')
The year when the most Star Wars sets were released was 2016.